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Abstract. We develop statistical enumeration methods for self-avoiding walks using a 
powerful sampling technique called the multicanonical Monte Carlo method. Using these 
methods, we estimate the numbers of the two dimensional A'^-step self-avoiding walks up to 
N — 256 with statistical errors. The developed methods are based on statistical mechanical 
models of paths which include self-avoiding walks. The criterion for selecting a suitable model for 
enumerating self-avoiding walks is whether or not the configuration space of the model includes 
a set for which the number of the elements can be exactly counted. We call this set a scale 
fixing set. We selected the following two models which satisfy the criterion: the Go model for 
lattice proteins and the Domb-Joyce model for generalized random walks. There is a contrast 
between these two models in the structures of the configuration space. The configuration space 
of the Go model is defined as the universal set of self-avoiding walks, and the set of the ground 
state conformation provides a scale fixing set. On the other hand, the configuration space of 
the Domb-Joyce model is defined as the universal set of random walks which can be used as a 
scale fixing set, and the set of the ground state conformation is the same as the universal set 
of self-avoiding walks. From the perspective of enumeration performance, we conclude that the 
Domb-Joyce model is the better of the two. The reason for the performance difference is partly 
explained by the existence of the first-order phase transition of the Go model. 



1. Introduction 

A self-avoiding walk (SAW) is a path on a lattice with the spatial restriction that the path must 
not visit the same site more than once [1]. A SAW is used as a model of chain polymers, because 
the above restriction represents the excluded volume effect. Lattice protein models such as the 
Go model IH El [6] and the HP model [7] have played important roles in the construction of the 
funnel picture of the energy landscape O [8l [9] , which forms the basic theoretical understanding 
of protein folding. 

Many sophisticated methods have been developed to analyze the thermodynamic properties 
of a lattice protein. The multi-self-overlap ensemble (MSOE) [14:\ [15], which is an extended 
version of the multicanonical Monte Carlo method [W\ lllj. is one of the best methods to find a 
ground state and calculate thermodynamic properties of a long lattice protein. The main idea 
of the MSOE is that the self-avoiding condition is relaxed so that some intersections of a chain 



are allowed. In the present work, we apply the idea to the enumeration problem of finite-step 



Besides having many applications like polymers, the SAW itself has many interesting 
asymptotic behaviors of infinite steps and has been studied by physicists, chemists, and 
mathematicians for a long time. In spite of great efforts, major parts of proposed asymptotic 
behaviors have not been solved in the rigorous mathematical sense and remain as conjectures. 
Enumeration of A^-step SAWs is a famous unsolved problem and the exact number of two- 
dimensional SAWs is known up to only 71 steps [2]. The total number of A-step SAWs is 
written as cat, and C71 is counted as 



which is larger than Avogadro's constant (~ 6.0221 x 10^'^). Since cn increases exponentially 
with A, it will soon be impossible to deal with all components of A-step SAWs on a computer 
and we have to use statistical methods for approximate enumeration of cat with large A. 

In our study, in order to estimate cn with large A accurately, we developed new enumeration 
methods using multicanonical simulations of the following two kinds of statistical mechanical 
models: the Domb-Joyce model [3] for generalized random walks and the Go model [H El [6] for 
lattice proteins. By using our methods, we were able to estimate the number of SAWs on a 
square lattice up to 256 steps with error estimation. 

2. Models and Methods 

2.1. Self-avoiding walk (SAW) 

We denote points on a d-dimensional cubic lattice by a;(i) G {i = 0, 1, 2, • • • ) and a set of 
points by a path u. An A-step random walk (RW) is defined by w = (a;(0), a;(l), • • • ,u;(A)) 
starting from the origin of the lattice with the constraint \uj{i + \) — (jj{i)\ = 1 (i = 0, 1, • • • , A^— 1). 
We denote the universal set of RWs as {cjIrw Since each site has 2d nearest neighbors, the total 
number of A-step RWs is exactly {2d)^ . In the case of a SAW, a further constraint imposed by 
oj{i) 7^ uj{j) for all i ^ j and we denote the universal set of SAWs as {w}sAW- This constraint 
makes it difficult to exactly count the total number of A^-step SAWs cjy. 

2.2. Multicanonical Monte Carlo method 

Two enumeration methods proposed in this paper are based on the same sampling technique 
called the multicanonical Monte Carlo method [TT] . If we define Hamiltonian % as a, function 
which maps a path on a d-dimensional lattice to energy, % : uja ^ Ea, we can introduce an 
energy structure into {wIrw and {wjsAW- Using the multicanonical method, we can estimate 
the number of states Q{E) accurately over a wide range of energy, as we will explain in the 
following sections. 

Introducing a weight W{E) as a function of E into the Markov chain Monte Carlo, we define 
the transition probability from a path coa to another path cof, 



where Ea and Efy are energy of Ua and Wf,, respectively. To calculate thermodynamic values at 
specific inverse temperature /3, we can use the Metropolis' method with W{E) oc exp(—/3E). 
In the multicanonical method, the energy space of W{E) is divided into n bins and inverse 
temperature /3i (i = 1,2,- •• ,n) is introduced to each bin. W{E) of the ith bin is set to be 
proportional to exp{—(3iE), and joint parameters Ui {i = 1,2, ••• ,n — 1) are introduced to 
connect W{E) continuously at the boundary of the bins. By modifying /3j (i = 1, 2, • • • , n) and 
Oj (i = 1,2,- •• ,n — 1), we determine W{E) to be approximately proportional to the inverse 



SAWs. 
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(1) 



of the number of states 1/0,{E). If the number of energy levels n^; is finite and n is equal to 
ue, which is true throughout our study, the multicanonical method is identical to the entropic 
sampling [16J. 

In our study, we built up W{E) using the Wang-Landau method [12l [13], the detailed 
procedure of which is given in the Appendix. If we give a transition probability using this 
W{E), the Markov chain Monte Carlo produces a flat histogram H{E). We can obtain 
H{E)/W{E) oc Q,{E) with high accuracy over a wide range of the energy scale. In order to 
estimate the number of states, we need to introduce an absolute scale. In Sec. E] and [U 

we introduce two different absolute scales. 



2.3. Multi-self-overlap ensemble 

To efficiently sample SAWs from {wjsAW; we can use the multi-self-overlap ensemble 
(MSOE) [MllIS], which is an extended version of the multicanonical method. Relaxing the self- 
avoiding conditions, the MSOE makes it possible to explore the configuration space faster than 
the case that the self-avoiding conditions are strictly kept despite the extra configuration space. 
The reason for the fast exploration is that transition paths from one SAW to another increase 
via configurations with one intersection or more. Let V be the number of the intersections of a 
path. Using V and the prescribed cutoff, Vcut-, we denote the original configuration space and 
the expanded configuration space as {wlK^*^ and {wjJ^T^^'"" , respectively. 

In the same way as with the multicanonical method, we build up the weight function W{E, V). 
By using the transition probability 



p{u:a tOb) = min 



W{Eb,Vb) ■ 



(2) 



we obtain a two-dimensional fiat histogram, H(E, V). The number of {wjsAW can be obtained 
from the equation i}(E) cx H{E,0)/W{E,0). We used the MSOE in the first enumeration 
method using the Go model. The idea of using V as a variable is shared by the second 
enumeration method using the Domb-Joyce model, which has the fully expanded configuration 
space jwjRw (= 1^}^^ )• 



3. Multicanonical simulation of the Go model 

3.1. Go model 

In the first enumeration method, we estimate ctv by multicanonical simulations of the Go 
model [H El E]; which was introduced originally to investigate the protein folding problem 
theoretically. The Go model is defined as a SAW with specialized interactions that gives one 
conformation ojnativci called the native structure, as a unique ground state apart from the trivial 
spatial symmetry. Using two indices of points of w, i and j, a native contact pair is defined 
by a pair of i and j ( j > i + 1) which satisfies the condition |c<Jnative(^) — ^native{j)\ = 1- The 
Hamiltonian of the intrachain interactions can be written as 

^M= E -e6{\u;{t)-u;{j)\,l). (3) 

(i,j)G{nativo contact pairs} 

If we define nncp as the total number of native contact pairs, the ground state energy Eqs is 
written as — enncn- 



3.2. Designing a native state 

We calculate g{E) = H{E,0)/W{E,0) of the Go model using the MSOE. In order to estimate 
i}{E) from g{E), we need to know the number of states of at least one bin of Q{E). We design 
a ^native that is suitable for that purpose, leaving aside the protein folding problem. 



Selecting a compact structure which can maximize the number of the native contact pairs 
for Wnativej we Can easily count the number of ground states. In the case of d = 2 and N = 24, 
for example, if we choose the configuration shown in Fig. [T]-(a) ("roll" shape) or (b) ("beta" 
shape) as a native structure, the number of ground states is 8 for both cases considering spatial 
symmetry. Roll-shaped or beta-shaped native states of any N are made from the following rules. 



(a) 




(b) 




Figure 1. Two types of native states of the Go model (N = 24): (a) roll and (b) beta shape. 
The start point, a;(0), of each chain is shown as a white circle. 



I. Find L (g N) which satisfies the condition \/]V < L < \/]V -|- 1. 

II. Draw the L x L compact structure using a roll or beta shape (Fig. [2] for L = 2, 3, 4, 5). 

III. Select steps from the start point lj{0). This can be used as ^native for the A^-step Go 
model. 

IV. The number of ground states are determined as below. 
In the case of a roll shape. 



GS) 



16 when N = {[Vn\){\Vn] 
8 otherwise. 



and in the case of a beta shape. 



16 when N = h"^ 
8 otherwise, 



where [ J is the floor function and [ ] is the ceiling function. In the cases of ^1{Eq^) = 16, there 
is an extra double degeneracy at the end of the paths. The roll-shaped and beta-shaped native 
states for < 24 are illustrated in Fig. [2]-(a) and (b), respectively. 



3.3. Scale fixing by the number of ground states 

If we design Unative with specified ^1{Eqs)^ like a roll or beta shape, the estimated number of 
states, Q*{E) (we denote estimated values calculated by simulations with asterisk (*)), can be 
calculated from g{E) as 

n*{E) = ^^jp^g{E) = XG-og{E). (4) 

Here we defined the scale fixing factor, Acs, by Q,{EQs)/g{EQs)- Since the A^-step Go model 
has the same configuration space as {wjsAWi the whole sum of ^1{E) is equal to cat, and the 
estimated number of A^-step SAWs is given by 



c*^ = Y,^*iE). 

E 



(5) 




Figure 2. (a) Roll- and (b) beta-shaped native states for the Go models with = 3 — 24. The 
start point, uj{0), of each chain is shown as a white circle. The steps with blue numbers have 8 
ground states and the steps with underlined red numbers have 16 ground states. 



The scale of ^}{E) is determined by the set of the ground states {(^}gS) which we call the scale 
fixing set. The idea of the scale fixing with the Go model is illustrated in Fig. [3j This model 
includes {wjsAW as the whole of its configuration space and, if it has a well-defined native state, 
it also includes a scale fixing set {wjcs as a part. 




Figure 3. The idea of scale fixing 
demonstrated with a roll-shaped 
(^native of the Go model (iV = 20). 
This model includes {wjsAW as the 
whole of its configuration space 
and also includes a scale fixing set 
{wIgs as a part. 



E 



4. Multicanonical simulation of the Domb-Joyce model 

4-1- Domb-Joyce model 

In the second enumeration method, we estimate cat by multicanonical simulations of the Domb- 
Joyce model [3]. The configuration space of the Domb-Joyce model is {w}rw In the Domb-Joyce 
model, each path is weighted with the factor 

N~2 N 

n n (i-^-^^j)' (6) 

i=Q j=i+2 

where n is a parameter of the model. If we replace u by 1 — exp(— /3J), each weight is identical 
to the Boltzmann factor of the Hamiltonian, which is written as 

n{uj) = Y, J H^{i),^{j)) = JV, (7) 

i<j 

where V is the number of intersections of a path. In the discussion below, we assume J > 0, 
which means a pair of crossed steps as a repulsive interaction of the strength J. 



In the limit of /3 — ?• (w; — )• 0) , all the configurations are equally weighted and the Domb- Joyce 
model corresponds to A^-step RWs. In the opposite limit, /3 — )• oo (it; — )• 1), the Domb-Joyce 
model reduces to A^-step SAWs. 



4-2. Scale fixing by the total number of N -step random walks 

From the multicanonical simulations of the Domb-Joyce model, we can obtain g{E) = 
H{E)/W{E) {E = JV) of the conformation space {w}rw- Since the total number of 
{ojjpfvv is exactly known as {2d)^ in the (i-dimensional cubic lattice, the universal set of 
the Domb-Joyce model {w}rw can be used as a scale fixing set. Using a scale fixing factor 
-^DJ = Y1,E^^^) I YIe di^)-! calculate the estimated number of states Q.*{E) as 

n*{E) = Adj g{E). (8) 

Then, c*^ is obtained by 

c^ = f^*(0). (9) 

The idea of the scale fixing with the Domb-Joyce model is illustrated in Fig. HI In contrast with 
the Go model, the Domb-Joyce model includes {wjsAW as a part of its configuration space and 
also includes a scale fixing set {cjIrw whole. 




2 4 6 8 10 12 14 16 18 

E 



Figure 4. The idea of scale fixing 
demonstrated with the Domb-Joyce 
model {N = 20). This model 
includes {i^jsAW 

part of 

its configuration space and also 
includes a scale fixing set {wIrw as 
a whole. 



5. Results and Discussions 

We estimated cat up to 144 steps with the Go model and up to 256 steps with the Domb-Joyce 
model with the same amount of computational effort. Thus, we conclude that the Domb- 
Joyce model is more efficient than the Go model for enumerating SAWs. The results are 
shown in Table [T] along with the known exact numbers [2] up to 71 steps, agree with 
the exact numbers within the statistical error. It should be noted that the present methods are 
statistically unbiased. These enumeration methods are highly efficient and successfully counted 
large numbers up to lO^'^^. c'Jj^ is shown in Fig. O 

Using g{E) calculated from the multicanonical simulations, we calculated specific heats of 
the two models (see Fig. [6]and[7|). Based on the size dependence of the specific heats of the Go 
model, we expect that the model exhibits a first-order phase transition at the long chain limit. 
We consider that this transition can partly explain the reason for the performance difference 
between the two developed methods, and we suggest that the performance of the statistical 
enumeration can be evaluated by the thermodynamic behavior of the models. 



Table 1. A table of the estimated number of A^-step SAWs with the A^-step exact number 
of SAWs Cm- The exact and estimated numbers are rounded off to 5 digits or less. The numbers 
in parentheses are the statistical errors for the last digits of the estimated numbers. 
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Figure 5. A plot of (c^)^/^ (pink) 

and Cj!j^ (blue). The error bars were 
shown every 10 steps and their values 
were magnified by a factor of 20. 




Figure 6. Temperature dependence of the specific heats 
of the Go model with (a) roll- and (b) beta-shaped native 
states. 



Figure 7. Temperature de- 
pendence of the specific heats 
of the Domb-Joyce model. 
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Appendix: algorithm of the developed enumeration methods for A^-step SAWs 

A pseudo-code of the Wang-Landau method and the muhicanonical method with a flow chart 

of the whole procedure to calculate with statistical errors. 



Algorithm 1. The Wang-Landau method and the mul- 
ticanonical method (hnes with WL were skipped in the 
latter algorithm) 



Initialization 



1: for the whole range of E do 
2: WiE) ^ 1 
3: H{E)^0 
4: end for 

5: make the first conformation ujo and calculate its en- 
ergy Eo 

6: / <— e > / is a modification factor of the weight 

function W{E). 
7: T 10^ > T is a cycle of judgement for updating 

/• 

8: Aflat ^ 0.7 ~ 0.95 

9: > Aflat is a fiatness parameter of judgment for 

updating /. 

10: t> These values of the initial settings are typical 
values. 



Functions 



11 
12 
13 
14 

15 
16 
17 
18 



function MovE(a;) 

make a candidate state uj' from uj 

returning u' 
end function 

function Energy(cj) 

calculate energy E oi uj from H 

returning E 
end function 



if r < then 



19: function JvDGE{Ea,Et,W{E)) 
20: generate a uniform random number r £ 
[0, 1) (r G 

21: 

22: returning 1 

23: else 
24: returning 

25: end if 
26: end function 



27: 
28: 
29: 
30: 
31: 

32: 
33: 
34: 
35: 
36: 



function \Jpuate(H (E)) 

U _ 
calculate the average of the histogram H{E), H 
for the whole range of E do 

> In some cases, the range of E is limited 

by hand 

if H{E) < H ■ Aflat then 

U ^0 
end if 
end for 
end function 



37: function lNlTlALlZE(7J"(iJ)) 
38: for the whole range of E do 



39 
40 
41 



H{E)^ 
end for 
end function 



Main part 



42 
43 
44 
45 
46 
47: 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 



while / - 1 > 10"** do 
for t := to T - 1 do 

oj't MovE(a;t) 

E'f -S— ENERGY(tjJ) 

J ^ .]VDGE{Et,E't,W{E)) 

if J = 1 then 



uJt+i ■ 
Et+i 
else 



LUt + l ^ UJt 

Et+1 Et 
end if 
WiEt+i 
H{Et+i] 
end for 

U ^ UPDATE(ff(i;)) 

if (7 = 1 then 
end if 

lNITIALIZE(ff(iJ)) 
end while 



W{Et+^)/f 
H{Et+i) + 1 



i> WL 



> WL 

> WL 

> WL 
WL 
WL 



Learning phase: 

Build up W{E) using the 
Wang-Landau method 



T" 



U = i 



Measurement phase: 

Obtain fi-omthe 
multicanonical method 
using built-up W(E) 




Analysis phase: 

Calculate and its 
standard error from H, {E) 

(i = 1, •■■,«») and W(E) 



Figure 8. Flow chart of the developed enu- 
meration methods for A^-step SAWs. nn is the 
number of Histograms. 
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